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Abstract 

Background: Molecular clock methodologies allow for the estimation of divergence times across a variety of 
organisms; this can be particularly useful for groups lacking robust fossil histories, such as microbial eukaryotes with 
few distinguishing morphological traits. Here we have used a Bayesian molecular clock method under three distinct 
clock models to estimate divergence times within oomycetes, a group of fungal-like eukaryotes that are ubiquitous in 
the environment and include a number of devastating pathogenic species. The earliest fossil evidence for oomycetes 
comes from the Lower Devonian (-400 Ma), however the taxonomic affinities of these fossils are unclear. 

Results: Complete genome sequences were used to identify orthologous proteins among oomycetes, diatoms, 
and a brown alga, with a focus on conserved regulators of gene expression such as DNA and histone modifiers 
and transcription factors. Our molecular clock estimates place the origin of oomycetes by at least the mid-Paleozoic 
(-430-400 Ma), with the divergence between two major lineages, the peronosporaleans and saprolegnialeans, in the 
early Mesozoic (-225-190 Ma). Divergence times estimated under the three clock models were similar, although only 
the strict and random local clock models produced reliable estimates for most parameters. 

Conclusions: Our molecular timescale suggests that modern pathogenic oomycetes diverged well after the origin of 
their respective hosts, indicating that environmental conditions or perhaps horizontal gene transfer events, rather than 
host availability, may have driven lineage diversification. Our findings also suggest that the last common ancestor of 
oomycetes possessed a full complement of eukaryotic regulatory proteins, including those involved in histone 
modification, RNA interference, and tRNA and rRNA methylation; interestingly no match to canonical DNA 
methyltransferases could be identified in the oomycete genomes studied here. 

Keywords: Oomycetes, Divergence times, Bayesian inference, Molecular clock, Gene expression regulation 



Background 

Eukaryotic diversity is primarily microbial, with multicellu- 
larity restricted to a few distinct lineages (plants, animals, 
fungi, and some algae). While the Proterozoic fossil record 
contains an abundance of organic-walled, often ornamen- 
ted, microfossils interpreted as eukaryotes, evidence for the 
origins and diversification of specific lineages of microbial 
eukaryotes is rare, especially for those groups with few 
diagnostic morphological characters [1]. Molecular clock 
methods therefore provide the only avenue for elucidating 
the evolutionary history of some lineages. With the recog- 
nition that a single rate ("strict") molecular clock as 
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originally proposed by Zuckerkandl and Pauling [2,3] was 
often inadequate in light of rate variation among organ- 
isms, early studies suggested the use of local clocks or the 
removal of lineages that violated the assumption of rate 
homogeneity (reviewed in [4]). The continued develop- 
ment of molecular clock methodologies over the past two 
decades has allowed for the estimation of divergence times 
under more complex models of rate variation. Initial 
"relaxed clock" methods, such as non-parametric rate 
smoothing [5] and penalized likelihood [6], allowed rates 
to vary but sought to minimize large differences between 
parent and descendent branches. Additionally, Bayesian 
relaxed clock methods allow rates to vary among lineages 
but assume autocorrelation by drawing the rate of a 
descendent branch from a distribution whose mean is 
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determined by the rate of the parent branch [7,8]; other 
Bayesian methods relax this assumption of autocorrelation 
for the co-estimation of phylogeny and divergence times 
[9]. Most recently, a random local clock model approach 
has been proposed which allows rate changes to occur 
along any branch in a phylogeny; this method allows users 
to directly test various local clock scenarios against a strict 
clock model of no rate changes [10]. 

In addition to improved modeling of rate variation, 
newer molecular clock methods are also able to better 
incorporate calibration uncertainty into the estimation 
of divergence times. Early methods treated fossil calibra- 
tions as fixed points (from which rates were derived); 
newer methods utilize probability distributions to better re- 
flect the paleontological uncertainty of a fossils phylogen- 
etic position in relation to modern organisms [11,12], as 
well as variance around the numerical age of geologic for- 
mations. However, some authors have already shown that 
modeling fossil probability distributions under different as- 
sumptions can have significant impacts on divergence time 
estimation [13], illustrating that rate calibration is still an 
important source of potential error in molecular clock 
studies. 

In this study we have focused on the fungal-like oomy- 
cetes (Peronosporomycetes sensu [14]), a group of het- 
erotrophic eukaryotes closely related to diatoms, brown 
algae, and other stramenopiles [15]. A close relationship 
among stramenopiles, alveolates, and several photosyn- 
thetic eukaryotes with red algal-derived plastids was previ- 
ously suggested as the supergroup Chromalveolata [16]. 
However molecular studies have supported a grouping of 
stramenopiles and alveolates with the non-photosynthetic 
rhizarians ("SAR" sensu [17]), excluding other photosyn- 
thetic lineages; the recently revised eukaryote classification 
has now formalized the Sar supergroup [18]. Many oomy- 
cetes are saprotrophic in aquatic and terrestrial ecosys- 
tems, however several devastating pathogens are known, 
such as Phytophthora infestans, the causal agent of late 
blight in solanaceous plant hosts [19]. Some orders are 
primarily pathogenic, such as the Peronosporales and 
Albuginales, while others are composed of both patho- 
genic and saprotrophic members, such as the Pythiales, 
Saprolegniales, Leptomitales, and Rhipidiales [20]. Several 
basal lineages, such as the Eurychasmales and Haliphthor- 
ales, are known primarily as pathogens of marine algae 
and crustaceans, leading some to suggest that the oomy- 
cetes may be "hard- wired" for pathogenic lifestyles [15]. 

The earliest robust fossil evidence of oomycetes comes 
from the Lower Devonian (Pragian, -408 Ma) Rhynie 
Chert [21]. Thick-walled, ornamented structures inter- 
preted as oogonium-antheridium complexes [22], as well 
as thin-walled polyoosporous oogonia [23], are well pre- 
served in association with degraded plant debris and 
cyanobacteria-dominated microbial mats. More recent 



oomycete fossils occur in the Carboniferous, where evi- 
dence for endophytic [24] and perhaps parasitic [25,26] 
interactions with plant hosts is more compelling. Add- 
itionally, the fossil species Combresomyces cornifer ori- 
ginally described from Lower Carboniferous chert in 
central France [27] has also been identified in Middle 
Triassic silicified peat from Antarctica [28], providing an 
intriguing example of geographic range and morpho- 
logical stasis over roughly 90 million years of oomycete 
evolution [29]. 

This is the first study to estimate divergence times 
within the oomycetes using molecular clock methods. 
Previous studies have typically included a single repre- 
sentative within a larger study of eukaryotic evolution 
[30-32], or have used oomycetes to root the analysis 
[33,34]. As there is little a priori information on the 
tempo of evolution within oomycetes, here we estimate 
divergence times under three distinct molecular clock 
models: a single-rate strict clock, a relaxed clock with 
uncorrected rates modeled under a lognormal distribu- 
tion (UCLD), and a random local clock model. The 
availability of several complete genome sequences for 
oomycetes, diatoms, and a brown alga allowed us to 
carefully curate a dataset of 40 orthologs for divergence 
time estimation; we chose to focus on known regulators 
of eukaryotic gene expression to investigate their pres- 
ence and level of conservation within pathogenic oomy- 
cetes. While the performance of the three models 
differed, the estimated divergence times suggested that 
oomycetes diverged from other stramenopiles by at least 
the mid- Paleozoic, and that two major lineages, the per- 
onosporaleans and saprolegnialeans, diverged in the 
early Mesozoic, approximately 200 Ma after the first ap- 
pearance of oomycetes in the fossil record. 

Results 

Regulators of gene expression in oomycetes 

Complete genome sequences from eighteen species were 
examined (Table 1). A total of 70 genes involved in the 
regulation of gene expression were examined for hom- 
ology in Phytophthora infestans (Table 2); homologs of 
two genes (Drosha-like; TFIIH, SsU subunit) could not 
be identified in P, infestans but were present in other 
oomycetes. In general, oomycetes possess a full comple- 
ment of canonical transcription factors and genes in- 
volved in chromatin modification, including multiple 
histone acetyltransferases, deacetylases, and methyltrans- 
ferases (Table 2). Proteins known to be involved in post- 
transcriptional gene silencing [35] were identified in our 
search, including homologs of Argonaut, Dicer, RNA- 
dependent RNA polymerase, double-stranded RNA bind- 
ing proteins, and an RNaselll-domain containing protein 
(Table 2). A recent study has shown that these genes are 
expressed and functional in P, infestans [36]. However, 
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Table 1 Species with complete genome sequences included in this study 



Species 


Strain/version 


Genome source 


Reference 


Achlyo hypogyno 


ATCC48635 


unpublished data 


(unpublished) 


Albugo laibachii 


Ncl4 


NCBI BLAST (http://blast.ncbi.nlnn.nih.gov) 


[37] 


Ectocarpus siliculosus 


Ec32 


BOGAS (http://bioinformatics.psb.ugent.be) 


[38] 


Frogiloriopsis cylindrus 


CCMP1102 vl.O 


DOE-Joint Genome Institute (http://www.jgi.doe.gov) 


(unpublished) 


Hyaloperonospora arabidopsis 


Emoy2 v8.3.2 


Virginia Bioinformatics Institute (http://www.vbi.vt.edu) 


[39] 


Phaeodactylum tricomutum 


CCAP 1055/1 v2.0 


DOE-Joint Genome Institute (http://www.jgi.doe.gov) 


[40] 


Phytophthoro copsici 


LT1534 vll.O 


DOE-Joint Genome Institute (http://www.jgi.doe.gov) 


[41] 


Phytophthora cinnamomi 


vl.O 


DOE-Joint Genome Institute (http://www.jgi.doe.gov) 


(unpublished) 


Phytophthoro infestans 


T30-4 


Broad Institute (http://www.broadinstitute.org) 


[42] 


Phytophthora porositico 


INRA-310 


Broad Institute (http://www.broadinstitute.org) 


(unpublished) 


Phytophthoro romorum 


Pr-102 vl.l 


DOE-Joint Genome Institute (http://www.jgi.doe.gov) 


[43] 


Phytophthora sojae 


P6497 v3.0 


DOE-Joint Genome Institute (http://www.jgi.doe.gov) 


[43] 


Pseudo-nitzschio multiseries 


CLN-47 vl .0 


DOE-Joint Genome Institute (http://www.jgi.doe.gov) 


(unpublished) 


Pythium ultimum 


BR 144 v4.0 


Pythium Genome Database (http://pythium.plantbiology.msu.edu) 


[44] 


Saprolegnia parasitica 


CBS223.65 


Broad Institute (http://www.broadinstitute.org) 


[45] 


Tetrahymena therrmophila 


SB210 V2008 


Tetrahymena Genome Database (http://ciliate.org) 


[46] 


Thalassiosira pseudonana 


CCMP1335 v3.0 


DOE-Joint Genome Institute (http://www.jgi.doe.gov) 


[47] 


Thraustotheca clavata 


ATCC34112 


unpublished data 


(unpublished) 



unlike the previous study, we were able to identify a sec- 
ond Dicer-like homolog in the genomes of other oomy- 
cetes that is absent in P. infestans; these sequences 
showed more similarity to human and Drosophila Drosha 
proteins than to other Dicer homologs (data not shown). 
Two distinct groups of Argonaut proteins were identified 
in the oomycetes, as well as two types of double-stranded 
RNA binding proteins (Table 2). While no homologs 
to canonical eukaryotic DNA methyltransferases could 
be identified, a homolog of DNA methyltransferase 1- 
associated protein was present in all the genomes analyzed 
here. Several genes involved in RNA methylation were 
also found (Table 2). 

Divergence time analyses 

Robust orthology relationships could be determined for 
52 out of the initial 70 datasets; 40 of these datasets con- 
tained minimal missing data and were used to estimate 
divergence times (see Additional file 1 for a list of genes 
included in the analysis). Calibration priors were mod- 
eled with a gamma distribution in order to assign higher 
probabilities to divergence times somewhat older than 
the hard bound (offset value); initial tests with lognormal 
priors produced very similar divergence times (data not 
shown). Five independent analyses of 50 million genera- 
tions each were run under each of the three models, 
with the random local clock model being the most com- 
putationally intensive. Strict clock and UCLD analyses run 
on an iMac (10.8.5) desktop with a 2.7 GHz Intel core 15 
processor took approximately seven days. Random local 



clock analyses run on a Linux (Mintl4) desktop with a 
3.3 GHz Xeon quad core processor took approximately 
30 days. Posterior distributions on parameters were identi- 
cal across all five runs under the strict clock model. Par- 
ameter distributions were consistent and overlapping for 
all five runs under the UCLD model with only one run de- 
viating for the estimate of the root height (700 Ma versus 
approximately 500 Ma in the other four runs), however all 
runs showed weak evidence of convergence even after 50 
million generations. One run under the random local 
clock model failed to converge; of the four successful runs, 
parameter distributions were consistent and overlapping 
with only one run deviating for the rate estimate (1.76 x 
10"^ versus 1.88 x 10"^ for the other three runs). Log and 
tree files for two of the five runs with the highest effective 
sample size (ESS) for the likelihood parameter were then 
combined; under the strict clock model, all five runs per- 
formed equally, so the first two runs were combined. Ana- 
lyses run without data (Prior Only) resulted in time 
estimates that were markedly different from those ob- 
tained with the full dataset for the majority of nodes 
(Table 3), suggesting that our divergence time estimates 
were driven by the data themselves and not by settings on 
the calibration priors. Divergence times among oomycete 
lineages were consistent among all three models (Table 3), 
however estimates under the UCLD model may have been 
influenced by poor mixing as several parameters showed 
ESS values less than 200 (Tables 3 and 4). The resulting 
timetree suggests an origin for oomycetes in the mid- 
Paleozoic, with a divergence between two major lineages. 
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Table 2 Conserved regulators of gene expression evaluated for divergence time analysis 

Gene Domains^ Reference'' 

Chromatin modificotion 



Anti-silencing factor Asfl 


asfl 


PITG_ 


_17091 


Brahma-like 


HAS, SNF2 N-terminal, Helicase conserved C-terminal, Bromodomain 


PITG_ 


_19037 


Chromodomain-containing protein (A) 


2x Chromo, SNF2 N-terminal, Helicase conserved C-terminal 


PITG_ 


_15837 


diromodomain-containing protein (B) 


[PDZ, QLQ], 2x Chromo, SNF2 N-terminal, Helicase conserved C-terminal, 
PHD-finger, PHD-like (zf-HC5HC2H_2) 


PITG_ 


_10083 


Clironnodomain-containing protein (C) 


PHD-finger, 2x Chromo, SNF2 N-terminal, Helicase conserved C-terminal 


PITG_ 


_00140 


Clironnodomain-containing protein (D) 


[Chromo], Bromodomain, PHD, Chromo, [PDZ], SNF2 N-terminal, 
Helicase conserved C-terminal, PHD, [PHD], PHD-like (zf-HC5HC2H), PHD 


PITG_ 


_03401 


CXXC zinc finger containing protein 


[SNF2 N-terminal], 2x CXXC zinc finger, [FHA] 


PITG_ 


_03547 


DNA metliyltransferase 1 -associated protein 


[DNA methyltransferase 1 -associated protein] 


PITG_ 


_15785 


ESA1-Iil<e liistone acetyltransferase 


Tudor-knot RNA binding, MOZ/SAS 


PITG_ 


_01456 


GCN5-lil<e liistone acetyltransferase 


GNAT Acetyltransferase, Bromodomain 


PITG_ 


_20197 


HATl-like histone acetyltransferase 


HATl N-terminus, [GNAT Acetyltransferase] 


PITG_ 


_00186 


KATl 1 domain histone acetyltransferase (A) 


TAZ zinc finger, Bromodomain, [PHD], KATl 1, ZZ zinc finger, TAZ 
zinc finger 


PITG_ 


_07302 


KATl 1 domain histone acetyltransferase (B) 


\JAZ, TAZ], Bromodomain, [DUF902], [PHD], KATl 1, [ZZ, TAZ] 


PITG_ 


_06533 


KATl 1 domain histone acetyltransferase (C) 


Bromodomain, [PHD], KATl 1 


PITG_ 


_18027 


KATl 1 domain histone acetyltransferase (D) 


Bromodomain, KATl 1 


PITG_ 


_08587 


Histone deacetylase HDAl 


histone deacetylase 


PITG_ 


_01897 


Histone deacetylase HDA2 


[ankyrin repeats], histone deacetylase 


PITG_ 


_08237 


Histone deacetylase HDA4 


histone deacetylase 


PITG_ 


_05176 


Histone deacetylase HDA5 


histone deacetylase 


PITG_ 


_15415 


Histone deacetylase HDA6 


histone deacetylase 


PITG_ 


_21309 


Histone deacetylase HDA7 


histone deacetylase 


PITG_ 


_12962 


Histone deacetylase HDA8 


histone deacetylase 


PITG_ 


_01911 


Histone deacetylase HDA9 


histone deacetylase 


PITG_ 


_04499 


DOTl-like histone methyltransferase 


DOTl 


PITG_ 


_00145 


Histone-lysine N-methyltransferase 


Bromodomain, PHD-like zinc-binding (zf-HC5HC2H), 
F/Y-rich N-terminus, SET 


PITG_20502, PITGJ 


Protein methyltransferase w/bicoid 


Methyltransferase, bicoid-interacting protein 3 


PITG_ 


_14915 


SLIDE domain-containing protein (A) 


DUF1898, SNF2 N-terminal, Helicase conserved C-terminal, SLIDE, 
[myb-like DNA-binding], HMG box 


PITG_ 


_02286 


SLIDE domain-containing protein (B) 


SNF2 N-terminal, Helicase conserved C-terminal, [HAND], SLIDE 


PITG_ 


_17273 


SSRPl subunit, FACT complex 


Structure-specific recognition protein, Histone chaperone Rttpl06-like 


PITG_ 


J 4260 


RNA Methylotion 








FtsJ-like rRNA Methyltransferase (A) 


FtsJ-like methyltransferase 


PITG_ 


_09405 


FtsJ-like rRNA Methyltransferase (B) 


FtsJ-like methyltransferase 


PITG_ 


_06848 


FtsJ-like rRNA Methyltransferase (C) 


FtsJ-like methyltransferase 


PITG_ 


_16337 


Spbl-like rRNA Methyltransferase 


FtsJ-like methyltransferase, DUF3381, Spbl C-terminal domain 


PITG_ 


_00663 


Guanosine 2'0 tRNA methyltransferase 


CCCH zinc finger, Ul 1-48 K CHHC zinc finger, TRM13 methyltransferase 


PITG_ 


_04858 


N2,N2-dimethylguanosine tRNA 
methyltransferase 


N2,N2-dimethylguanosine tRNA methyltransferase (TRM) 


PITG_ 


_10166 


MnmA-like tRNA 2'-thiouridylase 


tRNA methyltransferase 


PITG_ 


_08823 



RNA Silencing 

Argonaute (A) DUFl 785, PAZ, Piwi PITG_04470, PITG_04471 
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Table 2 Conserved regulators of gene expression evaluated for divergence time analysis (Continued) 


Argonaute (B) 


DUF1785, PAZ, Piwi 


PITG_01400, PITG_01443, 
PITG_01444 


Dicer-like 


[DEAD/H box helicase], dsRNA binding, 2x Rnase III domains 


PITG_09292 


Droslia-lil<e 


2x Rnase III domains, [dsRNA binding] 


Psojae_300435 


dsRNA-binding protein 


dsRNA binding 


PITG_12183 


dsRNA-binding protein w/ Bin3 


[methyltransferase], dsRNA binding, Bicoid-interacting 3 


PITG_03262 


Rnaselll domain protein 


Rnase III domain, [dsRNA binding] 


PITG_08831 


RNA-dependant RNA polymerase 


DEAD/H box helicase, Helicase conserved C-terminal, RdRP domain, 
[NTP transferase] 


PITG_10457 


Transcription factors 






Histone-like CBF/NF-Y 


CBF/NF-Y [CENP-S associated centromere protein X] 


PITG_00914 


Histone-like CBF/NF-Y w/HMG 


HMG, CBF/NF-Y 


PITG_19530 


Medl 7 subunit of Mediator complex 


Med 17 


PITG_03899 


pi 5 transcriptional coactivator 


2x PC4 


PITG_07058 


TFIIB 


TFIIB zinc-binding, 2x TFIIB 


PITG_14596 


TFIID, TATA-binding protein (A) 


2x TBP 


PITG_07312 


TFIID, TATA-binding protein (B) 


2x TBP, [DUF3378] 


PITG_ 12304 


TFIID, TATA-binding protein (C) 


TBP, [2x DUF3378], TBP 


PITG_06201 


TFIID, TAFl subunit 


DUF3591, Bromodomain 


PITG_02547 


TFIID, TAF2 subunit 


Peptidase Ml, [HEAT repeat] 


PITG_ 18882, PITG_ 14044 


TFIID, TAF5 subunit 


TFIID 90 kDa, 5x WD domain 


PITG_16023 


TFIID, TAF6 subunit 


TAF, DUF1546, [HEAT repeat] 


PITG_03978 


TFIID, TAF8 subunit 


Bromodomain (histone-like fold), TAF8 C-terminal 


PITG_ 18355 


TFIID, TAF9 subunit 


TFIID 31 kDa 


PITG_04860 


TFIID, TAFIO subunit 


TFIID 23-30 kDa 


PITG_07637, PITG_ 14668 


TFIID, TAFl 2 subunit 


TFIID 20 kDa 


PITG_00683 


TFIID, TAF14 subunit 


YEATS 


PITG_01229 


TFIIE, alpha subunit 


TFIIEalpha 


PITG_08403 


TFIIF ^Inh^ <;i il^i init 

II III , QlkJI IQ oUkJUl ML 


TFIIFalnha 

II III Ol IJi lO 


PITG n?^?7 


TFIIF, beta subunit 


TFIIFbeta 


PITG_10081 


TFIIH, Rad3 subunit 


DEAD 2, DUF1227, Helicase C-terminal 


PITG_15696 


TFIIH, Ssll subunit 


Ssll-like, TFIIH cl-like 


Psojae_345458 


TFIIH, Tfbl subunit 


FFIIH p62 N-terminal], BSD 


PITG_03523 


TFIIH, Tfb2 subunit 


Tfb2 


PITG_15486 


TFIIH, Tfb4 subunit 


Tfb4 


PITG_00220 


TFIIIB, Brfl-like subunit 


TFIIB zinc-binding, 2x TFIIB, Brfl-like TBP-binding 


PITG_16669 



^Domains in brackets Indicate missing or non-slgnlflcant matches In some species. 
"^Reference sequences from P. infestans T30-4 (PUG) or P. sojae P6497 vB.O. 

the peronosporaleans and saprolegnialeans, in the early 
Mesozoic (Figure 1). A complete list of divergence times 
with 95% confidence intervals for each node under each 
model is presented in Additional file 2. 

Discussion 

Models for estimating divergence times under a molecular 
clock have become more complex over the past two de- 
cades. In this study we have used three distinct models, a 



single-rate strict clock, a UCLD relaxed clock, and a ran- 
dom local clock, to estimate divergence times among the 
fungal-like oomycetes. Analyses run under the strict clock 
model performed robustly, with all parameters showing 
evidence of thorough sampling (ESS > > 1000) and chain 
convergence. Because we had no a priori expectation of 
rate homogeneity among oomycetes or between oomy- 
cetes and ochrophytes, we also estimated divergence times 
under "relaxed" clock models. Both the UCLD and 
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Table 3 Median divergence times (in Ma) for select nodes estimated under the three molecular clock models 






Strict clock 


UCLD relaxed clock 


Random local clock 


Node^ 


Prior only 


Full dataset 


Prior only 


Full dataset 


Prior only 


Full dataset 


a 


171.2 


26.6 


171.5 


26.7* 


170.6 


23.4* 


b 


271.9 


139.9 


271.9 


119.8* 


271.4 


134.6 


c 


167.5 


67.0 


167.0 


71.6* 


167.1 


75.1 


d 


363.6 


197.2 


363.7 


191.0* 


363.1 


214.1 


e 


83.1 


180.1 


83.0 


97.5* 


83.0 


139.4 


f 


183.8 


364.4 


183.7 


191.0 


183.8 


334.1 


g 


447.4 


414.7 


447.4 


424.8 


447.6 


415.6 


h 


526.6 


545.3 


527.04 


475.0* 


527.1 


533.9 



^node as shown in Figure 1. Asterisl<s (*) indicate ESS values < 200. A complete list of divergence times with 95% confidence intervals is presented in Additional 
file 2. 



random local clock models indicated moderate to high 
levels of rate variation among lineages (as shown by the 
coefficient of variation parameter, Table 4), suggesting 
that a strict clock model was not appropriate for our 
dataset regardless of performance of the MCMC. In 
addition, an analysis of Bayes factors suggested that the 
two relaxed clock methods were a better fit for the data 
(In Bayes factor in favor of relaxed clock models over 
strict clock >100). Rates estimated under the UCLD 
model appeared to be strongly influenced by the cali- 
bration priors, leading to rates 1.5 to 3.5 times higher 
in the ochrophyte lineages than in the oomycetes (data 
not shown). However, UCLD analyses failed to con- 
verge even after 50 million generations, thus limiting 
our ability to interpret parameter and divergence time 
estimates. Only a few parameters showed signs of poor 
mixing in the random local clock analyses (ESS < 200), 
but in general there was good evidence of chain conver- 
gence under this model, with the trade-off of long com- 
putational times. 

Despite differences in performance among the three 
clock models, divergence time estimates among oomy- 
cetes were strikingly consistent (Table 3 and Additional 
file 2), and all models estimated a mid-Paleozoic origin 



for oomycetes (Figure 1). Our estimate for the diver- 
gence of oomycetes from other stramenopiles is some- 
what consistent with results from a study of ochrophyte 
evolution using small subunit ribosomal DNA data [34], 
but is considerably younger than estimates generated 
from broader studies of eukaryote evolution [31,32]. 
However, it seems likely that the times recovered here 
for the divergence between oomycetes and ochrophytes, 
as well as the root node, may be underestimated, for sev- 
eral reasons. A recent simulation study of relaxed clock 
models showed that the deepest nodes in a tree tend to 
be underestimated when shallow calibrations are used 
[48], which reflects our reliance on diatom calibrations 
to estimate divergences throughout the tree. Also, the 
posterior distributions recovered for the ingroup (node g 
in Figure 1) and root (node h) time estimates overlapped 
with their respective prior distributions, and were tightly 
constrained by the lower limit of 408 Ma imposed by 
the priors (data not shown). In addition, the long branch 
connecting the origin of oomycetes (node g) to the di- 
vergence between the peronosporaleans and saprolegnia- 
leans (node d), as well as the long branch in the 
calibration taxa (between nodes e and f ), may have influ- 
enced rate estimates under the UCLD and random local 



Table 4 Mean posterior values for select parameters estimated under the three molecular clock models 



Strict clock UCLD relaxed clock Random local clock 



Parameter 


Prior only 


Full dataset 


Prior only 


Full dataset 


Prior only 


Full dataset 


Likelihood 


n/a 


-359159.81 


n/a 


-358847.49 


n/a 


-358852.83 


Posterior 


n/a 


-359328.94 


n/a 


-358975.87 


n/a 


-359061.08 


Yule.birtlirate 


0.0052 


0.0062 


0.0052 


0.0074 


0.0052 


0.0065 


Clocl<.rate 


0.9990 


0.0018 


n/a 


n/a 


0.9970 


0.0019 


ucld.mean 


n/a 


n/a 


0.1000 


0.0024* 


n/a 


n/a 


ucld.stdev 


n/a 


n/a 


0.0999 


0.5550* 


n/a 


n/a 


CoefficientOfVariation 


n/a 


n/a 


0.0979 


0.5370* 


0.1230 


0.2380* 


RateCliangeCount 


n/a 


n/a 


n/a 


n/a 


0.6950 


8.7050 



Asterisks {*) indicate ESS < 200. n/a - not applicable. 
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Figure 1 Timetree of oomycete evolution. Divergence times shown were estimated under the random local clock model. Vertical dashed lines 
indicate boundaries between geologic eras. 



clock models. As a result, divergence times estimated for 
these nodes were sensitive to the model, particularly the 
ochrophyte estimates under the UCLD clock (Table 3 
and Additional file 2); however, given the poor perform- 
ance of the UCLD analysis, it is difficult to assess the re- 
liability of these estimates. Additional sequence data 
from basal oomycetes such as Eurychasma dicksonii [49] 
and Haliphthoros sp. [50], as well as from more ochro- 
phyte calibration taxa, will help break up these long 
branches and led to more reliable rate estimates. The 
oldest accepted oomycete fossils come from the Lower 
Devonian Rhynie Chert, which is thought to have been a 
non-marine hot spring environment [21,22]. Phylogen- 
etic evidence suggests that the earliest diverging oomy- 
cetes were likely marine [15,20], therefore the origin of 
this group may have occurred some time prior to the ap- 
pearance of fossils in non-marine environments. 

Fossil evidence of oomycetes also occurs throughout 
the Carboniferous, particularly in association with lyco- 
phytes (reviewed in [21]). While previous authors have 
suggested affinities with certain taxonomic groups (e.g., 
[25,26]), the divergence times estimated here indicate 
that modern peronosporalean and saprolegnialean line- 
ages originated much later, in the mid to late Mesozoic 
(Figure 1). Modern saprolegnialeans, such as Saprolegnia 
parasitica, are commonly associated with freshwater en- 
vironments, and can be devastating pathogens of fish, 
amphibians, crustaceans, and insects [45]; saprotrophic 



species, such as Thraustotheca clavata, are also known 
from this group. In contrast, modern peronosporaleans 
are predominately terrestrial and many are significant 
plant pathogens. Two species included in our analysis, 
Hyaloperonospora arabidopsidis [39] and Albugo laiba- 
chii [37], are obligate biotrophs who are fully dependent 
on their host {Arabidopsis). Phytophthora species cause 
disease on a wide variety of plants, and significant effort 
has been undertaken to understand their mechanisms of 
virulence and host specificity (reviewed in [51]). While it 
is undesirable to extrapolate as to the likely hosts for 
early diverging lineages, it does seem reasonable to sug- 
gest that host availability was not a constraining factor 
in oomycete diversification. Particularly for the modern 
plant pathogenic oomycetes, both fossil and molecular 
clock evidence suggests that the major lineages of angio- 
sperms had diversified by the mid-Cretaceous [52], prior 
to our estimates for divergences among the peronospor- 
aleans. The evolution of pathogenic lifestyles, therefore, 
may have been in response to certain environmental 
changes, or may have been facilitated by the horizontal 
transfer of pathogenicity-related genes from true Fungi 
[53-55] or from bacteria [45,56], as has been suggested 
previously. 

In this study, we chose to focus on conserved regula- 
tors of eukaryotic gene expression to examine their pres- 
ence and level of conservation in pathogenic oomycetes. 
Mechanisms of gene expression regulation are highly 
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conserved across eukaryotes and were most likely 
present in the last common ancestor, including epigen- 
etic and RNA-based processes for transcriptional and 
post-transcriptional gene silencing [57-59]. Although we 
have not conducted an exhaustive survey here, our re- 
sults suggest that the common ancestor of oomycetes 
possessed a full complement of regulatory proteins, in- 
cluding those involved in histone modification, RNA 
interference, and tRNA and rRNA methylation. Surpris- 
ingly, no orthologs of canonical DNA methyltransferases 
could be identified in the genomes of oomycetes. A sin- 
gle putative DNA methylase is present in the genome of 
Pythium ultimum (T014901), but no orthologs could be 
detected in the other oomycete genomes. Gene silencing 
studies in Phytophthora infestans have failed to detect 
evidence of cytosine methylation [60,61], however recent 
work in P, sojae does suggest the presence of methylated 
DNA [62]. DNA methyltransferases also appear to be 
absent from the Ectocarpus genome [38], as well as from 
the model eukaryotes Saccharomyces cerevisiae and Cae- 
norhabditis elegans [63], however several are known 
from diatoms [64,65]. Further study is therefore needed 
to confirm the presence and mechanism of DNA methy- 
lation in oomycetes. 

Conclusions 

This is the first study to estimate divergence times 
among the fungal-like oomycetes. The consistency of 
our time estimates under three distinct molecular clock 
models suggests that the resulting timetree likely re- 
covers the main divergences among lineages, which oc- 
curred in the mid to late Mesozoic. Our estimates for 
the origin of oomycetes and the divergence of strameno- 
piles from other eukaryotes may have been underesti- 
mated due to the limited fossil information available for 
the taxa included in this study. Additional information 
from the oomycete fossil record, especially from the di- 
verse Cretaceous assemblages, as well as new sequence 
data from basal oomycete lineages and other under- 
sampled eukaryotes [66], may help future molecular 
clock studies better estimate evolutionary rates. 

Methods 

Data mining 

Reference sequences for canonical eukaryotic transcrip- 
tion factors and proteins involved in post-transcriptional 
gene silencing, DNA and RNA methylation, and chroma- 
tin modification were obtained from the Gene Database 
at NCBI (http://www.ncbi.nlm.nih.gov/gene) for human, 
Drosophiluy Saccharomyces ^ and/or Arabidopsis, The refer- 
ence protein sequences were then used to search for ho- 
mologs in the genome of Phytophthora infestans T30-4 
[42]. Additional reference sequences were also obtained 
from a study of gene silencing in P, infestans [36]. Both 



the eukaryotic reference sequences and the putative 
P, infestans homologs were used to search the available 
genomes of oomycetes, diatoms, and a brown alga (Table 1); 
outgroup sequences were obtained from Tetrahymena 
thermophila when available. All potential homologs of 
equivalent BLAST e-values within each genome were 
included for orthology assessment. 

Dataset assembly 

Protein domains were determined for all potential ho- 
mologs using Pfam [67]. Sequences that did not contain 
the appropriate domains for proper protein function 
were removed from each dataset except in cases where 
the protein sequence appeared truncated due to genome 
misannotation, particularly for Hyaloperonospora arabi- 
dopsidis. Each dataset was aligned under default settings 
in ClustalX v2 [68], and preliminary neighbor-joining 
phylogenies were generated under a Poisson correction 
with pairwise deletion of alignment gaps in MEGA v5 
[69]. Sequences within each dataset were considered 
orthologous if they shared protein domains and their 
phylogeny reflected known species relationships. In data- 
sets with species-specific paralogs, one sequence was ar- 
bitrarily chosen to represent the ortholog for divergence 
time estimation. In cases where orthology was ambigu- 
ous or no homolog could be identified, the sequence 
was coded as missing data. A complete list of protein ac- 
cession numbers per gene for each genome is available 
in Additional file 1. 

Divergence time analysis 

Protein datasets with robust orthology were used to co- 
estimate phylogeny and divergence times using Bayesian 
inference in BEAST vl.7.5 [70]. Initial runs of 10 million 
generations were used under each clock model to evalu- 
ate settings on priors and to generate a user tree for sub- 
sequent analyses. For the final analyses, each protein 
dataset was treated as a separate partition under a 
WAG substitution model; a Yule speciation process was 
assumed with a uniform distribution on the birthrate 
(0-100; initial value 0.01). For the strict clock analyses, 
the rate parameter {clockrate) was modeled with an ex- 
ponential prior distribution (mean 1.0, initial value 
0.01). For the UCLD relaxed clock model, an exponen- 
tial prior distribution (mean 0.1, initial value 0.01) was 
used for the mean rate (ucld.mean) and standard deviation 
{ucldstdev). Several parameters control the rate and num- 
ber of rate changes under the random local clock model; a 
Poisson distribution (mean 0.693) was used as the prior 
for the number of local clocks {rateChanges), an exponen- 
tial prior distribution (mean 1.0, initial value 0.001) was 
used for the relative rates among local clocks {localclocks, 
relativerates), and an exponential prior distribution (mean 
1.0, initital value 0.01) was used for the rate {clockrate). 
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Five independent analyses were run for 50 million genera- 
tions each, under all three clock models; log and tree files 
from the two runs with the highest parameter ESS values 
per model were combined (after removing burn-in from 
each run) using LogCombiner vl.7.5. Tracer vl.5 [71] was 
used to evaluate convergence, estimate the appropriate 
burn-in for each run, and calculate Bayes factors for 
model comparisons. Analyses were also repeated without 
data (priors only) to determine the impact of calibration 
settings on the resulting divergence time estimates; three 
independent runs of 50 million generations each were per- 
formed under each clock model Trees were visualized in 
FigTree vl.4 [72]. 

Fossil evidence from diatoms and oomycetes was used 
to calibrate the molecular clock analyses; all calibrations 
were modeled with a gamma prior distribution (shape 
2.0) with the offset value set as the uppermost boundary 
of the time interval (stage) containing the relevant fossil. 
The value for the scale parameter was set so that the age 
at the 95% quantile was roughly equivalent to the lower- 
most boundary on the geologic epoch containing the 
relevant fossil. Appropriate geological times were ob- 
tained from the International Commission on Stratig- 
raphy chronostratigraphic chart, January 2013 version 
(http://stratigraphy.org). Fossil evidence from the Late 
Cretaceous (Campanian) pennate diatoms [73] provided 
a minimum age of 72.1 Ma on the divergence between 
Thalassiosira and Phaeodactylum (5-95% quantiles = 
74-100 Ma). Early Jurassic (Toarcian) diatom fossils [74] 
provided a minimum age of 174 Ma on the divergence 
between diatoms and Ectocarpus (5-95% quantiles = 
176-202 Ma). The Rhynie chert oomycete fossils [21] 
were used to define a minimum divergence time of 
408 Ma between oomycetes and ochrophytes (5-95% 
quantiles = 418-550 Ma). A wide uniform prior distribu- 
tion (408-1750 Ma; initial value 635 Ma) was used for the 
root age as there are few robust estimates on the diver- 
gence between alveolates and stramenopiles. Beast XML- 
formatted data files have been deposited in Dryad [75]. 

Availability of supporting data 

The data sets supporting the results of this article are 
available in the Dryad Digital Repository, http://dx.doi. 
org/10.5061/dryad.39mc5. 
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